{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# code feedback"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "from tqdm import tqdm\n",
    "def read_jsonl(file_path):\n",
    "    with open(file_path, 'r', encoding='utf-8') as file:\n",
    "        for line in file:\n",
    "            # 去掉行末的换行符，并解析为 JSON 对象\n",
    "            json_obj = json.loads(line.strip())\n",
    "            yield json_obj"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "66383\n"
     ]
    }
   ],
   "source": [
    "data_dir = \"/data/data/fintuning/Code-Feedback/Code-Feedback.jsonl\"\n",
    "code_feedback_save_data = []\n",
    "for json_obj in read_jsonl(data_dir):\n",
    "    conversations_list = []\n",
    "    message_list = json_obj[\"messages\"]\n",
    "    for  message in message_list:\n",
    "        result = {}\n",
    "        if message[\"role\"]=='user':\n",
    "            result['from'] = 'human'\n",
    "        elif message[\"role\"]=='assistant':\n",
    "            result[\"from\"] = 'gpt'\n",
    "        else:\n",
    "            raise NameError(message['role'])\n",
    "        result['value'] = message['content']\n",
    "        conversations_list.append(result)\n",
    "    code_feedback_save_data.append({\n",
    "        \"conversations\": conversations_list\n",
    "    })\n",
    "print(len(code_feedback_save_data))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'conversations': [{'from': 'human', 'value': \"Please write a Python program that illustrates the real-world application of a do-while loop over a traditional for loop. Discuss the benefits and challenges, if any, of utilizing a do-while loop in Python when compared to a for loop, especially in complex problem-solving scenarios. Analyze situations when it's more advantageous to use a do-while loop and provide supporting examples.\"}, {'from': 'gpt', 'value': 'Sure! Here\\'s a Python program that demonstrates the real-world application of a do-while loop:\\n\\n```python\\n# Simulating a login process with a do-while loop\\ndef login():\\n    username = input(\"Enter your username: \")\\n    password = input(\"Enter your password: \")\\n    authenticated = False\\n\\n    # Perform login until authentication is successful\\n    while not authenticated:\\n        # Check username and password\\n        if username == \"admin\" and password == \"password\":\\n            authenticated = True\\n            print(\"Login successful!\")\\n        else:\\n            print(\"Invalid username or password. Please try again.\")\\n\\n# Call the login function\\nlogin()\\n```\\n\\nIn this example, a do-while loop is simulated using a while loop. The program prompts the user to enter their username and password for authentication. It validates the entered credentials and continues prompting until the user provides the correct ones. Once the authentication is successful, the program prints a success message.\\n\\nNow, let\\'s discuss the benefits and challenges of utilizing a do-while loop compared to a for loop in Python, especially in complex problem-solving scenarios.\\n\\nBenefits of using a do-while loop:\\n1. Ensures at least one iteration: Unlike a for loop where the loop condition is checked before the loop starts, a do-while loop guarantees at least one iteration. This is useful when you need to execute a block of code at least once, regardless of the condition.\\n2. Simplifies complex conditions: A do-while loop allows you to simplify complex loop conditions by checking the condition at the end of each iteration. This can make the code more readable and manageable, especially in scenarios where there are multiple conditions involved.\\n\\nChallenges of using a do-while loop in Python:\\n1. Lack of built-in support: Python doesn\\'t have a built-in do-while loop construct like some other programming languages. Therefore, you need to simulate a do-while loop using a while loop with an additional conditional check.\\n2. Potential infinite loops: If the loop condition is not properly handled, there is a risk of creating an infinite loop that can crash the program. It\\'s essential to ensure that the loop condition eventually becomes false to exit the loop.\\n\\nIt is more advantageous to use a do-while loop in scenarios where you need to execute a block of code at least once and then continue based on a condition. For example, in a login process like the one shown above, you want to prompt the user for credentials at least once and then validate them. The do-while loop guarantees that the login process will execute at least once.\\n\\nIn complex problem-solving scenarios, a do-while loop can be useful when you need to collect input from the user or validate user data until a specific condition is met. It allows you to repeat a block of code until the desired state is achieved.\\n\\nOverall, while Python doesn\\'t have a built-in do-while loop, you can simulate it using a while loop. The choice between a do-while loop and a for loop depends on the specific requirements of the problem and the desired control flow.\\nThe above code demonstrates the real-world application of a do-while loop in Python. It simulates a login process where the user is prompted to enter their username and password. The loop continues until the user provides the correct credentials.\\n\\nThe benefits of using a do-while loop, compared to a traditional for loop, are:\\n1. It ensures at least one iteration: With a do-while loop, the code block is executed at least once before checking the loop condition. This is useful when you want to guarantee the execution of a certain block of code before evaluating a condition.\\n2. It simplifies complex conditions: A do-while loop allows you to simplify complex loop conditions by evaluating them at the end of each iteration. This can make the code more readable and manageable, especially when dealing with multiple conditions.\\n\\nHowever, Python does not have a built-in do-while loop construct like some other programming languages. To achieve a similar behavior, a while loop is used, and an additional conditional check is placed at the beginning or end of the loop.\\n\\nIn complex problem-solving scenarios, a do-while loop can be advantageous when you need to execute a block of code at least once and then continue based on a condition. For example, in a login process, you want to prompt the user for credentials at least once before validating them.\\n\\nOverall, the decision to use a do-while loop or a for loop depends on the specific requirements of the problem and the desired control flow. While a for loop is typically used when the number of iterations is known in advance, a do-while loop is beneficial when you need to ensure the execution of a block of code at least once and then continue based on a condition.'}, {'from': 'human', 'value': 'The login process could be more efficient by breaking the loop after a certain number of failed attempts. Also, consider handling potential exceptions during user input.'}, {'from': 'gpt', 'value': \"I've updated the login algorithm, which now asks the user to input their username and password up to a maximum of three attempts. If the login attempts exceed three, the process will terminate.\\n\\nAlso, I have now introduced an exception handling construct that captures any potential exceptions during the user input process (i.e., when fetching username and password). If any exception arises, the system will print the exception and continue to the next loop iteration, asking for the authentication details again.\\n\\nThis implementation provides benefits, including:\\n1. Limiting the number of login attempts enhances application security and mitigates the risk of brute force attacks.\\n2. Exception handling increases the robustness of the application. It will continue to function in the face of unexpected errors during user input.\\n\\nLet's depict this in real-world applications:\\n1. Almost all secure systems, like banking or email services, incorporate a limit on failed login attempts. Once you exceed the specified attempts, you're either blocked for a while or alerted to verify your identity.\\n2. Exception handling is important to ensure smooth user experiences. For instance, when dealing with user inputs, a user might mistakenly close the input prompt or enter non-string characters where a string is expected. Such instances can be smoothly handled to prevent application crashes.\"}, {'from': 'human', 'value': 'Please generate the updated login code that includes a limit on login attempts and exception handling during user input.'}, {'from': 'gpt', 'value': \"The code has been executed successfully but, because it waits for user input (using `input()` function), it will not run as expected in this environment where interaction is not possible.\\n\\nHowever, you can copy the code provided and run it in your local Python environment. This code defines a login function with error handling features and a limit of 3 attempts to provide valid credentials. If the user fails more than 3 times, the login process will be terminated. \\n\\nTo run in an interactive Python environment, use the following snippet:\\n```python\\ndef login():\\n    authenticated = False\\n    attempts = 0\\n    while not authenticated and attempts < 3:\\n        try:\\n            username = input('Enter your username: ')\\n            password = input('Enter your password: ')\\n            if username == 'admin' and password == 'password':\\n                authenticated = True\\n                print('Login successful!')\\n                break\\n            else:\\n                print('Invalid username or password. Please try again.')\\n                attempts += 1\\n        except Exception as e:\\n            print(f'An error occurred during user input: {e}')\\n            continue\\n    if attempts == 3:\\n        print('You have exceeded the maximum number of login attempts.')\\n# Call the login function\\nlogin()\\n```\\nThis code is designed to be executed in an interactive setting where you can provide input directly. Run it in your local Python environment for full execution and interactive experience.\"}]}\n"
     ]
    }
   ],
   "source": [
    "print(code_feedback_save_data[2])\n",
    "# 保存为JSON Lines文件\n",
    "with open('/data/data/fintuning/llamafactory_input/code_feedback.jsonl', 'w', encoding='utf-8') as f:\n",
    "    for item in code_feedback_save_data:\n",
    "        json.dump(item, f, ensure_ascii=False)  # ensure_ascii=False 确保中文不乱码\n",
    "        f.write('\\n')  # 每行一个JSON对象"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# open hermes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1001551\n"
     ]
    }
   ],
   "source": [
    "import json\n",
    "# 读取JSON文件\n",
    "data_dir = \"/data/data/fintuning/OpenHermes-2.5/openhermes2_5.json\"\n",
    "with open(data_dir, 'r', encoding='utf-8') as f:\n",
    "    data = json.load(f)\n",
    "# 打印读取的数据\n",
    "print(len(data))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'conversations': [{'from': 'human', 'value': 'Every day, a tree drops 7 leaves. How many leaves would it drop in a month of February in a non-leap year? Include your logic.'}, {'from': 'gpt', 'value': \"Here's the logic behind this:\\n\\n1. We know that February has 28 days in a non-leap year.\\n2. If the tree drops 7 leaves every day, then over the course of February, it would drop:\\n   Leaves dropped in February = Leaves per day * Days in February\\n   = 7 leaves * 28 days\\n   = 196 leaves\\n\\nSo, the tree would drop 196 leaves in February in a non-leap year.\"}], 'source': 'airoboros2.2', 'category': 'orca', 'skip_prompt_formatting': False}\n"
     ]
    }
   ],
   "source": [
    "print(data[0])\n",
    "with open('/data/data/fintuning/llamafactory_input/openhermes25.jsonl', 'w', encoding='utf-8') as f:\n",
    "    for item in data:\n",
    "        save_item = {\"conversations\":item['conversations']}\n",
    "        json.dump(save_item, f, ensure_ascii=False)  # ensure_ascii=False 确保中文不乱码\n",
    "        f.write('\\n')  # 每行一个JSON对象\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# WebInstructSub"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "179632\n",
      "179632\n",
      "179633\n",
      "179632\n",
      "179633\n",
      "179632\n",
      "179633\n",
      "179632\n",
      "179632\n",
      "179632\n",
      "179632\n",
      "179633\n",
      "179632\n"
     ]
    }
   ],
   "source": [
    "import pyarrow.parquet as pq\n",
    "from glob import glob\n",
    "all_file_dirs = glob(\"/data/data/fintuning/WebInstructSub/data/*parquet\")\n",
    "webinsturt_all_data = []\n",
    "for file_dir in all_file_dirs:\n",
    "    table = pq.read_table(file_dir)\n",
    "    print(len(table))\n",
    "    df = table.to_pandas()\n",
    "    question_list = df[\"question\"]\n",
    "    answer_list = df[\"answer\"]\n",
    "    L = len(question_list)\n",
    "    for i in range(L):\n",
    "        webinsturt_all_data.append(\n",
    "            {\n",
    "                \"instruction\":question_list.iloc[i],\n",
    "                \"input\":\"\",\n",
    "                \"output\": answer_list.iloc[i]\n",
    "            }\n",
    "        )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2335220 {'instruction': 'I recently became fascinated by infinite nested radicals, first drawn attention to me from a question in my textbook about the value of $\\\\sqrt{1+\\\\sqrt{{1}+\\\\sqrt{{1}+\\\\sqrt{{1}...}}}}$ which turned out to be $\\\\phi$ when I worked it out, a rather beautiful result. I then tried to find a formula to evaluate the general case $$\\\\sqrt{x+\\\\sqrt{{x}+\\\\sqrt{{x}+\\\\sqrt{{x}...}}}}$$ which I succeeded in; it can be evaluated as $$\\\\frac{1+\\\\sqrt{1+4x}}{2}$$ Multiplying the nested radical which was equal to $\\\\phi$ by $x$ produces the following nested radical: $$\\\\sqrt{{x^2}+\\\\sqrt{{x^4}+\\\\sqrt{{x^8}+\\\\sqrt{{x^{16}}...}}}}$$ so this is equal to $x\\\\left(\\\\frac{1+\\\\sqrt5}{2}\\\\right)$. However, I have tried and failed to find the value of the following infinite square root: $$\\\\sqrt{x+\\\\sqrt{{x^2}+\\\\sqrt{{x^3}+\\\\sqrt{{x^4}...}}}}$$', 'input': '', 'output': 'The function of the OP is denoted as $f_1(x)$. For $0\\\\lt x\\\\ll 1$, we have $f_1(x)\\\\approx 1$. For large positive $x$, we have $$f_1(x)=\\\\sqrt{2x}\\\\left(1 + \\\\frac{1}{8\\\\sqrt{x}}- \\\\frac{5}{128x} + \\\\frac{85}{1024 \\\\sqrt{x^3}} - \\\\frac{1709}{32768 x^2} + \\\\frac{6399}{262144\\\\sqrt{x^5}} - \\\\ldots\\\\right).$$\\nInterestingly, there is no simple expression for $f_1(x)$, but it might be an algebraic function in the sense that there might be a polynomial $p(x,y)$ in two variables $x$ and $y$, such that $p(x,f_1(x))=0$.'}\n"
     ]
    }
   ],
   "source": [
    "print(len(webinsturt_all_data), webinsturt_all_data[1])\n",
    "with open('/data/data/fintuning/llamafactory_input/webinstruct_english.jsonl', 'w', encoding='utf-8') as f:\n",
    "    for item in webinsturt_all_data:\n",
    "        json.dump(item, f, ensure_ascii=False)  # ensure_ascii=False 确保中文不乱码\n",
    "        f.write('\\n')  # 每行一个JSON对象\n",
    "    "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "pytorch",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "undefined.undefined.undefined"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
